03. RNN History
A bit of history
How did the theory behind RNN evolve? Where were we a few years ago and where are we now?
02 RNN History V4 Final
As mentioned in this video, RNNs have a key flaw, as capturing relationships that span more than 8 or 10 steps back is practically impossible. This flaw stems from the "vanishing gradient" problem in which the contribution of information decays geometrically over time.
What does this mean?
As you may recall, while training our network we use backpropagation. In the backpropagation process we adjust our weight matrices with the use of a gradient. In the process, gradients are calculated by continuous multiplications of derivatives. The value of these derivatives may be so small, that these continuous multiplications may cause the gradient to practically "vanish".
LSTM is one option to overcome the Vanishing Gradient problem in RNNs.
Please use these resources if you would like to read more about the Vanishing Gradient problem or understand further the concept of a Geometric Series and how its values may exponentially decrease.
If you are still curious, for more information on the important milestones mentioned here, please take a peek at the following links:
Here is the original Elman Network publication from 1990. This link is provided here as it's a significant milestone in the world on RNNs. To simplify things a bit, you can take a look at the following additional info.
In this LSTM link you will find the original paper written by Sepp Hochreiter and Jürgen Schmidhuber. Don't get into all the details just yet. We will cover all of this later!
As mentioned in the video, Long Short-Term Memory Cells (LSTMs) and Gated Recurrent Units (GRUs) give a solution to the vanishing gradient problem, by helping us apply networks that have temporal dependencies. In this lesson we will focus on RNNs and continue with LSTMs. We will not be focusing on GRUs.
More information about GRUs can be found in the following blog. Focus on the overview titled: GRUs.